55 research outputs found
Sparse Conformal Predictors
Conformal predictors, introduced by Vovk et al. (2005), serve to build
prediction intervals by exploiting a notion of conformity of the new data point
with previously observed data. In the present paper, we propose a novel method
for constructing prediction intervals for the response variable in multivariate
linear models. The main emphasis is on sparse linear models, where only few of
the covariates have significant influence on the response variable even if
their number is very large. Our approach is based on combining the principle of
conformal prediction with the penalized least squares estimator
(LASSO). The resulting confidence set depends on a parameter and
has a coverage probability larger than or equal to . The numerical
experiments reported in the paper show that the length of the confidence set is
small. Furthermore, as a by-product of the proposed approach, we provide a
data-driven procedure for choosing the LASSO penalty. The selection power of
the method is illustrated on simulated data
Transductive versions of the LASSO and the Dantzig Selector
We consider the linear regression problem, where the number of covariates
is possibly larger than the number of observations , under sparsity assumptions. On the one hand, several methods have
been successfully proposed to perform this task, for example the LASSO or the
Dantzig Selector. On the other hand, consider new values . If one wants to estimate the corresponding 's, one should
think of a specific estimator devoted to this task, referred by Vapnik as a
"transductive" estimator. This estimator may differ from an estimator designed
to the more general task "estimate on the whole domain". In this work, we
propose a generalized version both of the LASSO and the Dantzig Selector, based
on the geometrical remarks about the LASSO in pr\'evious works. The "usual"
LASSO and Dantzig Selector, as well as new estimators interpreted as
transductive versions of the LASSO, appear as special cases. These estimators
are interesting at least from a theoretical point of view: we can give
theoretical guarantees for these estimators under hypotheses that are relaxed
versions of the hypotheses required in the papers about the "usual" LASSO.
These estimators can also be efficiently computed, with results comparable to
the ones of the LASSO
Consistency of plug-in confidence sets for classification in semi-supervised learning
Confident prediction is highly relevant in machine learning; for example, in
applications such as medical diagnoses, wrong prediction can be fatal. For
classification, there already exist procedures that allow to not classify data
when the confidence in their prediction is weak. This approach is known as
classification with reject option. In the present paper, we provide new
methodology for this approach. Predicting a new instance via a confidence set,
we ensure an exact control of the probability of classification. Moreover, we
show that this methodology is easily implementable and entails attractive
theoretical and numerical properties
Sparse conformal predictors: SCP
Conformal predictors, introduced by Vovk et al. (Algorithmic Learning in a Random World, Springer, New York, 2005), serve to build prediction intervals by exploiting a notion of conformity of the new data point with previously observed data. We propose a novel method for constructing prediction intervals for the response variable in multivariate linear models. The main emphasis is on sparse linear models, where only few of the covariates have significant influence on the response variable even if the total number of covariates is very large. Our approach is based on combining the principle of conformal prediction with the ℓ 1 penalized least squares estimator (LASSO). The resulting confidence set depends on a parameter ε>0 and has a coverage probability larger than or equal to 1−ε. The numerical experiments reported in the paper show that the length of the confidence set is small. Furthermore, as a by-product of the proposed approach, we provide a data-driven procedure for choosing the LASSO penalty. The selection power of the method is illustrated on simulated and real dat
How Correlations Influence Lasso Prediction
We study how correlations in the design matrix influence Lasso prediction.
First, we argue that the higher the correlations are, the smaller the optimal
tuning parameter is. This implies in particular that the standard tuning
parameters, that do not depend on the design matrix, are not favorable.
Furthermore, we argue that Lasso prediction works well for any degree of
correlations if suitable tuning parameters are chosen. We study these two
subjects theoretically as well as with simulations
On Lasso refitting strategies
A well-know drawback of l_1-penalized estimators is the systematic shrinkage
of the large coefficients towards zero. A simple remedy is to treat Lasso as a
model-selection procedure and to perform a second refitting step on the
selected support. In this work we formalize the notion of refitting and provide
oracle bounds for arbitrary refitting procedures of the Lasso solution. One of
the most widely used refitting techniques which is based on Least-Squares may
bring a problem of interpretability, since the signs of the refitted estimator
might be flipped with respect to the original estimator. This problem arises
from the fact that the Least-Squares refitting considers only the support of
the Lasso solution, avoiding any information about signs or amplitudes. To this
end we define a sign consistent refitting as an arbitrary refitting procedure,
preserving the signs of the first step Lasso solution and provide Oracle
inequalities for such estimators. Finally, we consider special refitting
strategies: Bregman Lasso and Boosted Lasso. Bregman Lasso has a fruitful
property to converge to the Sign-Least-Squares refitting (Least-Squares with
sign constraints), which provides with greater interpretability. We
additionally study the Bregman Lasso refitting in the case of orthogonal
design, providing with simple intuition behind the proposed method. Boosted
Lasso, in contrast, considers information about magnitudes of the first Lasso
step and allows to develop better oracle rates for prediction. Finally, we
conduct an extensive numerical study to show advantages of one approach over
others in different synthetic and semi-real scenarios.Comment: revised versio
The Smooth-Lasso and other -penalized methods
We consider a linear regression problem in a high dimensional setting where
the number of covariates can be much larger than the sample size . In
such a situation, one often assumes sparsity of the regression vector, \textit
i.e., the regression vector contains many zero components. We propose a
Lasso-type estimator (where '' stands for quadratic)
which is based on two penalty terms. The first one is the norm of the
regression coefficients used to exploit the sparsity of the regression as done
by the Lasso estimator, whereas the second is a quadratic penalty term
introduced to capture some additional information on the setting of the
problem. We detail two special cases: the Elastic-Net , which
deals with sparse problems where correlations between variables may exist; and
the Smooth-Lasso , which responds to sparse problems where
successive regression coefficients are known to vary slowly (in some
situations, this can also be interpreted in terms of correlations between
successive variables). From a theoretical point of view, we establish variable
selection consistency results and show that achieves a
Sparsity Inequality, \textit i.e., a bound in terms of the number of non-zero
components of the 'true' regression vector. These results are provided under a
weaker assumption on the Gram matrix than the one used by the Lasso. In some
situations this guarantees a significant improvement over the Lasso.
Furthermore, a simulation study is conducted and shows that the S-Lasso
performs better than known methods as the Lasso, the
Elastic-Net , and the Fused-Lasso with respect to the
estimation accuracy. This is especially the case when the regression vector is
'smooth', \textit i.e., when the variations between successive coefficients of
the unknown parameter of the regression are small. The study also reveals that
the theoretical calibration of the tuning parameters and the one based on 10
fold cross validation imply two S-Lasso solutions with close performance
Generalization of l1 constraints for high dimensional regression problems
We focus on the high dimensional linear regression
, where
\beta^{*}\in\mathds{R}^{p} is the parameter of interest. In this setting,
several estimators such as the LASSO and the Dantzig Selector are known to
satisfy interesting properties whenever the vector is sparse.
Interestingly both of the LASSO and the Dantzig Selector can be seen as
orthogonal projections of 0 into
\mathcal{DC}(s)=\{\beta\in\mathds{R}^{p},\|X'(Y-X\beta)\|_{\infty}\leq s\} -
using an distance for the Dantzig Selector and for the
LASSO. For a well chosen , this set is actually a confidence region for
. In this paper, we investigate the properties of estimators defined
as projections on using general distances. We prove that the
obtained estimators satisfy oracle properties close to the one of the LASSO and
Dantzig Selector. On top of that, it turns out that these estimators can be
tuned to exploit a different sparsity or/and slightly different estimation
objectives
- …